Method for Preparing Sequence Tags

ABSTRACT

Means to circulate any nucleic acid molecule and to obtain from such circular nucleic acid molecules fragments that mark both ends of the initial nucleic acid molecule are provided. Means of high value to studies including, but not limited to, expression profiling, splicing, promoter identification, identification of genetic elements, and beyond, which are essential components of commercial applications and services including, but not limited to, drug development, diagnostics, or forensic studies are also provided.

FIELD OF THE INVENTION

The invention relates to the identification of nucleic acid moleculesand cloning of fragments thereof. Information on such fragments can berelated to functional regions within genomes or transcribed regions.Furthermore, the invention relates to the analysis of fragments for thepurpose of gene identification and expression profiling. Thus, thepresent invention allows for studies on biological systems, thecharacterization of genetic elements, and the analysis of genesexpressed therein.

BACKGROUND ART

Genomes contain the essential genetic information for development andhomeostasis of any living organisms. For an understanding of biologicalphenomena, knowledge is required on how such genetic information isutilized in a cell or tissue at a given time point. It is known thatmistakes in the utilization of genetic information and relatedregulatory pathways may cause disease in human or plant and animal inmany cases. Thus, a method is needed for expression profiling andannotation of the identified transcripts as well as for characterizinggenetic elements under the control of the genetic information. Mostexpression studies nowadays use either approaches based on in situhybridization, e.g. microarrays, or those based on high-throughputsequencing of short tags, e.g. SAGE, CAGE, MMPS. The two types ofapproaches have distinct advantages over each other. However, for ourunderstanding of the regulatory principles behind gene expression, it isdesirable to also obtain information on the genetic elements whichcontrol gene expression

High-tlhroughput expression profiling is commonly performed by the useof so-called DNA microarrays (Jordan B., DNA Microarrays: GeneExpression Applications, Springer-Verlag, Berlin Heidelberg New York,2001: Schena A, DNA Microarrays, A Practical Approach, Oxford UniversityPress, Oxford 1999, both hereby incorporated herein by reference). Forsuch experiments specific probes representing individual genes ortranscripts are placed on a support and simultaneously hybridized with aplurality of samples. Positive signals are obtained where a probe on thesupport reacts with a molecule presented with the sample. Theseexperiments allow the parallel analysis of a large number of genes ortranscripts. However, the approach is limited to the fact that onlygenes or transcripts can be studied, which were initially identified byother experimental means. Such means can include cDNA libraries, partialsequence tags and/or results obtained from computer predictions. In thefuture, the concept of tiled arrays may also allow for an unbiasedexpression profiling in organisms for which genomic sequences areavailable (Kapranov P. et al., Science 296, 916-919 (2002), herebyincorporated herein by reference). However, as tiled arrays presentgenomic sequences as such, data from those experiments are difficult tointerpret where multiple transcripts are derived from the same regionwithin the genome. Thus tiled arrays can provide information on whichregions within genomes are actively transcripted, but in high-throughputexpression profiling experiments fall short on the characterization ofindividual transcripts.

Due to the limitations of DNA microarray experiments alternativeapproaches are in use for gene discovery and expression profiling, whichare based on partial sequences, said tags, obtained from a plurality ofMnRNA samples. The so-called SAGE (Serial Analysis of Gene Expression)method is known as an efficient method for obtaining partial informationon the base sequences in mRNAs (Velculescu V. E. et at., Science 270,484-487 (1995), hereby incorporated herein by reference). This methodforms DNA concatemers by ligating multiple short DNA fragments(initially about 10 bp) containing information on the base sequences atthe 3′-end of multiple mRNAs, and determines the base sequences in theseDNA concatemers. Recently an approved version of SAGE, the so-calledLongSAGE, has been published, which allows for the cloning of longerSAGE tags (Saha S. et al., Nat. Biotechnol. 20, 508-12 (2002), US patentapplications 20030008290, 20030049653, all hereby incorporated herein byreference). The SAGE method is currently in wide use as an importantmethod for analyzing genes expressed in specific cells, tissues ororganisms; and SAGE tags are available for reference in the publicdomain, e.g. under http://cgap.nci.nih.gov/SAGE.

U.S. Pat. Nos. 6,352,828, 6,306,597, 6,280,935, 6,265,163, and5,695,934, all hereby incorporated herein by reference, disclosed adifferent approach for the high-throughput sequencing of short sequencetags, also denoted as Massively Parallel Signature Sequencing or “MPSS”.As described in further details in Brenner S., et al., Nat. Biotechnol.18, 630-634 (2000), and Brenner S., et al., Proc. Natl. Acad. Sci. USA97, 1655-1670 (2000), both hereby incorporated herein by reference,preferentially short sequences from the 3′-end of transcripts areobtained in a highly parallel manner performing cycles with differentenzymatic reactions on a single layer of beads.

As both of the aforementioned approaches focused on the utilization of3′-end derived sequence tags, new approaches have been developed toobtain also sequence tags from other regions, in particular the 5′-ends,of transcripts. Such an approach has been disclosed in PCT/JP03/07514,and Shiraki T. et al., Prog. Natl. Acad. Sci. USA 100, 15776-15781(2003), both hereby incorporated herein by reference. This so-calledCAGE (Cap-Analysis-Gene-Expression) approach allows for the cloning of5′-end specific tags into concatemers similar to the SAGE technology,where the so-called CAGE tags enable not only the detection oftranscripts and their expression profiling, but further provideinformation on transcriptional start sites to allow for mechanisticstudies on the regulation of transcription or a higher annotation oftranscripts.

However, any of the above approaches focuses only on the cloning andsequencing of one sequence tag per nucleic acid molecule. Suchapproaches, however, do not always allow for a correct analysis of theinformation, where often the sequence information within a tag is notsufficient for mapping to the genome or other approaches inbioinformatics. Therefore, it is desirable to not only have a tag fromone region within a nucleic acid molecule, but to be able to clone bothends of the nucleic acid molecule in such a way that the tags derivedfrom such an approach would allow for the identification of the ends ofnucleic acid molecules.

SUMMARY OF THE INVENTION

Here, the present invention provides means to circularize any nucleicacid molecule and obtain from such circular nucleic acid moleculesfragments that mark the two ends of the initial nucleic acid molecule.Thus, the invention represents a great improvement in the analysis ofgenomic or transcripted genetic information, and nucleic acid moleculesderived thereof The invention provides a further means of high value tostudies including, but not limited to, expression profiling, splicing,promoter identification, identification of genetic elements, and beyond,which are essential components of commercial applications and servicesincluding, but not limited to, drug development, diagnostics, orforensic studies.

The invention relates to methods for the isolation of fragments fromnucleic acid molecules for the purpose of cloning and analysis. Thus,the invention relates to the conversion of a sample containing one ormore nucleic acid molecules, and such nucleic acid molecules or anymixture of nucleic acid molecules would be converted into DNA.

In one embodiment the invention relates to the manipulation of nucleicacid molecules that would provides linear nucleic acid moleculescontaining information on the opposite end sequences of a target nucleicacid molecule in the form of linear double-stranded DNA.

The present invention provides a method for preparing DNA fragmentscomprising sequences corresponding to two opposite end regions of alinear nucleic acid molecule, comprising the steps of- creating a linearDNA molecule from a nucleic acid molecule; ligating linkers to twoopposite ends of the linear DNA molecule, wherein such linkers contain acloning site and a recognition site for a restriction endonuclease thatcleaves at a site outside its recognition site and within the linear DNAmolecule; circularizing the linear DNA molecule by closing the linearDNA molecule at its cloning site so as to form a circular DNA molecule;digesting the circular DNA molecule with a restriction endonuclease thatcleaves at a site outside its recognition site and cuts out a DNAfragment from the circular DNA molecule, wherein the DNA fragmentcomprises opposite end regions of the linear DNA molecule; and isolatingthe DNA fragment.

The invention involves the manipulation of double-stranded DNA by theaddition of specific linkers to opposite ends of such a double-strandedDNA molecule, where such linkers would provide a means for the fuirtheramplification, manipulation and/or purification of the double-strandedDNA molecule. The linkers as attached to the ends of a double-strandedDNA molecule would provide the necessary means to allow for thecircularization of the DNA molecule. Thus, the invention provides ameans for the conversion of linear DNA into circular DNA and theamplification of such circular DNA.

Further, the invention involves steps to manipulate DNA fragments insuch a way that linkers are attached ends. Such linkers would contain arecognition site for a Class Ils or Class mI enzyme adjacent or close totheir cloning sites. Thus, the linkers provide the necessary means tocleave out fragments or tags from the ends of DNA molecules. Theinvention utilizes the isolation of tags from ends of nucleic acidmolecules. Such regions can be derived from different experimentalapproaches and allow for the characterization of the origin of theinitial nucleic acid molecules. Due to the circularization steps, thetags derived from the ends of the same linear DNA molecule are linked toeach other by a spacer as derived from linker sequences. Thus, theinvention provides a means for the preparation of a new type of sequencetag, the so-called GSC-tag (Gene-Scanning-CAGE-tag), which allows forthe identification and characterization of nucleic acid molecules bytheir end sequences. Furthermore, GSC-tags are prepared in such a waythat related tags from the same nucleic acid molecule are combined inthe same GSC-tag, and that the spacer sequence connecting the two tagsfrom the ends would allow for the labeling of the GSC-tag by a shortsequence tag.

Further, the invention involves the cloning of the tags derived from theDNA molecules. Such tags are purified and cloned as concatemers into taglibraries for easier manipulation and sequencing, said GSC-library.Thus, the invention provides a means for the high-throughput sequencingof tags derived from the ends of nucleic acid molecules.

In an embodiment the invention relates to the cloning of tags fromdifferent samples. A label would mark the origin of each molecule withinsuch a mixed tag library. Similarly, tags prepared by differentapproaches can be individually labeled and used for the preparation ofpooled libraries. Thus, the invention relates to the labeling of tags bydefined sequences, where such sequences is introduced during the linkerligation and/or circularization steps before cloning into concatemers.

In another embodiment, the invention relates to the sequencing of thetags to allow for their annotation by computational means and theirstatistical analysis. Thus, the invention relates to a means for genediscovery, gene identification, gene expression profiling, andannotation.

In just another embodiment, the invention relates to the sequencing ofthe tags to allow for their annotation by computational means and theirstatistical analysis. Such tags could be derived from regions withingenomes. Thus, the invention relates to the characterization of geneticelements within genomes.

In just a different embodiment, the invention relates to the preparationof hybridization probes from the ends nucleic acid molecules. Suchregions can be analyzed by the means of in situ hybridization. In apreferred embodiment, the in sits, hybridization experiment makes use ofa tiled array.

In just one more embodiment, the invention relates to the full-lengthcloning of nucleic acid molecules. The sequence information obtainedfrom the tags is used for primer design, and such primers are used toamplify the nucleic acid molecule in an amplification reaction. It iswithin the scope of the invention to amplify and clone in such a waytranscripted regions as well as genomic fragments, where such fragmentscan contain genetic elements or said promoter regions.

Thus, the invention provides means for the analysis of nucleic acidmolecules and short fragments thereof as needed for example for thecharacterization of biological samples.

BRIEF DESCRIPTION OF THE DRAWINGS

FIG. 1 is a schematic diargaram showing the first-strand cDNA primingand poly-A tail demoval.

FIG. 2 is a schematic diagram showing the linker ligation step.

FIG. 3 is a schematic diagram showing the amplification step.

FIG. 4 is a schmetic diagram showing the digestion and concatenationsteps.

FIG. 5 is a schematic diagram showing the cloning steps

FIG. 6 shows vector pGSC.

FIG. 7 is a diagram showing the targeting of non-polyadenylated RNA.

FIG. 8 is a diagram showing the preparation of hybridization probes.

FIG. 9 shows iii situ hybridization using tiled arrays.

DETAILED DESCRIPTION OF THE INVENTION

The invention encompasses a method for handling single-stranded as wellas double-stranded nucleic acids in the form of linear and circularnucleic acid molecules. Double-stranded DNA means any nucleic acidmolecules each of which is composed of two polymers formed bydeoxyribonucleotides and in which the two polymers have substantiallycomplementary sequences to each other allowing for their association toform a dimeric molecule. The two polymers are bound to one another byspecific hydrogen bonds formed between matching base pairs within thedeoxyribonucleotides. Any DNA molecule composed only of one polymerchain formed by two or more deoxyribonucleotides having no matchingcomplementary DNA molecule to associate with is considered to be asingle-stranded DNA molecule for the purpose of the invention, even ifsuch a molecule may form secondary structures comprising double-strandedDNA portions. As used interchangeably herein, the terms “nucleic acidmolecule(s)” and “polynucleotide(s)” include RNA or DNA regardless ofsingle or double-stranded, coding or non-coding, complementary or not,and sense or antisense, and also include hybrid sequences thereof. Inparticular, it encompasses genomic DNA and complementary DNA which aretranscribed or non-transcribed, spliced or not spliced, incompletelyspliced or processed, independent from its origin, cloned from abiological material, or obtained by means of synthesis. RNA for thepurpose of the invention is considered a single-stranded nucleic acidmolecule even where such a molecule may form secondary structurescomprising double-stranded RNA portions. In particular, RNA encompassesfor the purpose of the invention any form of nucleic acid moleculecomprised of ribonucleotides, and does not related to a particularsequence or origin of the RNA Thus, RNA can be transcribed in vivo or invitro by artificial systems, or non-transcribed, spliced or not spliced,incompletely spliced or processed, independent from its natural originor derived from artificially designed templates, mRNA, tRNA, rRNA,obtained by means of synthesis, or any. mixture thereof. More precisely,the expressions “DNA”, “RNA”, “nucleic acid”, and “sequence” encompassnucleic acid materials themselves and are Thus, not restricted toparticular sequence information, vector, phagemid or any other specificnucleic acid molecule. The term “nucleic acid” is also used herein toencompass naturally occurring nucleic acids, artificially synthesized orprepared nucleic acids, any modified nucleic acids into which at leastone or more modifications have been introduced by naturally occurringevents or through approaches known to a person skilled in the art.Similarly, a “tag” according to the invention can be any region of anucleic acid molecules as prepared by the means of the invention, wherethe term “tag” as used herein encompasses any nucleic acids fragment, nomater whether it is derived from naturally occurring, artificiallysynthesized or prepared nucleic acids, any modified nucleic acids intowhich at least one or more modifications have been introduced bynaturally occurring events or through approaches known to a personskilled in the art. Furthermore, the term “tag” does not relate to anyparticular sequence information or their composition but to the nucleicacid molecules as such. The terms “purity”, “enriched”, “purification”,“enrichment”, or “selection” are used interchangeably herein and do notrequire absolute purity or enrichment of a product but rather areintended as relative definitions. The terms “specific”, “preferable”, or“preferential” are used interchangeably herein and do not requireabsolute specificity of a DNA or RNA hybridization probe, or an enzymefor its substrate or an activity, but rather they are intended to haverelative definitions which include the possibility that an enzyme mayhave low or lower affinity to other compounds related or unrelated toits substrate. Similarly, the terms used to name an enzyme, or anenzymatic activity, are used herein to describe the finction or activityof such a component, but do not require the absolute purity of such acomponents. Thus, any mixture containing such an enzyme, enzymaticactivity, or mixtures thereof with other components of the same, relatedor unrelated function are within the scope of the invention. Similarly,DNA or RNA molecules may function in a specific manner as hybridizationprobes, and as such are related to as “complementary sequences” for thepurpose of the invention, or in experiments where such probes areapplied for the detection of a related nucleic acid molecule, even wheresuch a probe and the target molecule may be distinct by naturallyoccurring or artificially introduced mutations in individual positions.The term “biological samples” includes any kind of material obtainedfrom living organisms including microorganisms, animals, and plants, aswell as any kind of infectious particles including viruses and prions,which depend on a host organism for their replication. As such“biological samples” include any kind material obtained from a patient,animal, plant or infectious particle for the purpose of research,development, diagnostics or therapy. Thus, the invention is not limitedto the use of any particular nucleic acid molecules or their origin, butthe invention provides general means to be applied to and used for thework on and the manipulation of any given nucleic acid. Any such nucleicacid molecules as applied to perform the invention can be obtained orprepared by any method known to a person skilled in the art including,but not limited to, those described by Sambrook J. and Russuell D. W.,Molecular Cloning, A Laboratory Manual, Cold Spring Harbor LaboratoryPress, New York, 2001, hereby incorporated herein by reference.

The invention relates to methods for the isolation of fragments fromnucleic acid molecules for the purpose of cloning and analysis. Thus theinvention relates to the conversion of a sample containing one or morenucleic acid molecules, where such nucleic acid molecules or any mixtureof nucleic acid molecules would be converted into DNA To perform theinvention, nucleic acid molecules can be derived from any naturallyoccurring genomic DNA, RNA sample, an existing DNA library, is ofartificial origin, or any mixture thereof. The invention is not limitedto the use of an individual nucleic acid molecule or any plurality ofnucleic acid molecules, but the invention can be performed on anindividual nucleic acid molecule or any plurality of nucleic acidmolecules regardless whether such pluralities would occur in nature, bederived from an exciting library, or be artificially created.Furthermore, the invention can process any nucleic acid moleculeregardless of its origin or nature. Thus it is within the scope of theinvention that the nucleic acid molecules could be full-length moleculesas compared to naturally occurring nucleic acid molecules, or anyfragment thereof Even furthermore, it can be envisioned that suchfragments of nucleic acid molecules could be prepared by a randomprocess or by a targeted dissection of nucleic acid molecules by themeans of an enzymatic activity with a preference for a certain sequence,or by means which would allow for the fragmentation based on thestructure of the nucleic acid molecule including, but not limited to,exons and introns within transcripted regions. Thus the invention is notrestricted to the use of any particular starting material.

The invention is not dependent on the use of DNA only, as a personfamiliar with the state of the art will know different approaches toconvert RNA into DNA including, but not limited to, those approachesdisclosed by Sambrook J. and Russuell D. W., ibid, hereby incorporatedherein by reference. After conversion of RNA into DNA, a single-strandedor double-stranded DNA molecule having the same or complementarysequence to the original RNA can be obtained, said cDNA. Such cDNAmolecules are commonly prepared in the form of liner DNA, where the twoopen ends allow for their manipulation. However, even where cDNAs arecloned into a vector, a person trained to the state of the art will knowabout the necessary means to release an insert from such a vector toconvert it into linear DNA.

In one embodiment of the invention, parts of the sequencing tags arederived from the 3′-end of transcripts. For the cloning of tags derivedfrom the actual 3′-end of mRNAs, it is important to remove polyA-tailsfrom the RNA to obtain meaningful information. One approach for theremoval of polyA tails has been published by Shibata Y. et al.,Biotechniques, 1042 to 1044, 1048-1049 (2001), hereby incorporatedherein by reference, which can be applied for the cloning of 3′-endrelated tags (compare to FIG. 1). The primer as used for thefirst-strand cDNA synthesis has a recognition site for the Class IIsrestriction enzyme GsuI, which will cleave the resulting double-strandedcDNA 14/16 bp from its recognition site, which is adjacent to anoligo-dT stretch of 14 nucleotides used in the priming step. After cDNAsynthesis GsuI is used to cut of the remaining poly-dA/dT stretchbetween the 3′-end of the cDNA and its recognition site. The cohesiveend created by Gsu I digestion can then be used for 3′-end-specificlinker ligation, where such a linker could contain a Class IIs or ClassIII recognition site adjacent or close to the ligation site for cuttingof a sequencing tag, a cloning site, and/or a label for the purificationof such a tag. Thus the invention provides means for the removal ofpolyA-tails from 3′-ends to allow for a meaningful analysis of mRNAs. Injust a different embodiment, the invention provides means for the 3′-endspecific priming of non-polyadenylated RNA In this embodiment of theinvention, a double-stranded linker having a random single-strandedoverhang is ligated to the 3′-end of a RNA molecule (FIG. 7 a). Suchlinkers can be designed similar to other approaches known to a personfamiliar with the state of the art including but not limited the methoddescribed by Shibata Y. et al., Biotechnique 30, 1250-1254 (2001),hereby incorporated herein by reference. The 3′-end specific linker asused for the priming of the cDNA synthesis, could further contain aClass IIs or Class III recognition site for cutting of the sequencingtag from the 3′-end of the ligation product, a cloning site, and/or alabel for the purification of such a tag. Thus the invention providesmeans for the—possibly—full-length cDNA preparation fromnon-polyadenylated RNA. Furthermore, the same linker ligation step canbe applied to block the cDNA synthesis of polyadenylated RNA. In such anembodiment of the invention, a double-stranded linker having asingle-stranded oligo-dT overhang is ligated to the 3′-end of a RNAmolecule (FIG. 7 b). Due to the oligo-dT overhang, such a linker wouldpreferentially be ligated to polyadenylated RNA. However, in contrast tothe aforementioned linker having random overhangs, the 3′-end of theoligo-dT overhang would be blocked, for example by the use of a dideoxynucleotide in the last position. Thus, such a modified linker would nolonger enable strand extension. In addition the 5′-end of the upperstrand of such a linker could be modified in such a way that a specificbinding substance would be attached to it, where such a specific bindingsubstance would allow for the selective removal of polyadenylated RNA bythe means of a high affinity ligand binding to the specific bindingsubstance. Many combinations of a specific binding substance and a highaffinity ligand are known to a person familiar with the state of the artincluding, but not limited to, the use of biotin and streptavidin, ordigoxigenin and an anti-digoxigenin antibody. In this way, the inventionprovides means for the selective priming of non-polyadenylated RNA, andthe separation of such RNA from polyadenylated RNA- Thus the inventionprovides means for the cloning and analysis of real 3′-ends of nucleicacid molecules including any type of RNA.

In a different embodiment of the invention, the sequencing tags areobtained from the 5′-end of transcripts. Different approaches for theutilization of 5′-end-specific sequence tags have been disclosed inPCT/JP03/07514, and Shiraki T. et al., ibid, both hereby incorporatedherein by reference. All such approaches make use of the 5′-end-specificcap structure of mRNA molecules, which can be used to selectively enrich5′-ends or full-length mRNA molecules. As well known to a personfamiliar with the state of the art of the field, such approaches includebut are not limited to the cap trapper method (Carninci P. et al.,Methods in Enzymology, 303, pp. 1944, 1999, hereby incorporated hereinby reference), oligo-capping (Maruyama K., Sugano S., Gene 138, 171-174(1994), hereby incorporated herein by reference), use of a cap-bindingprotein Edery I. et al., Mol Cell Biol. 15, 3363-3371 (1995), herebyincorporated herein by reference), use of an antibody that specificallybinds to the cap structure (Theissen H. et al., EMBO J. 12, 3209-3217(1986), hereby incorporated herein by reference), oxidation of capstructure followed by adding an oligonucleotide to the cap structure(U.S. Pat. No. 6,022,715, hereby incorporated herein by reference), orthe cap-switch method disclosed in U.S. Pat. No. 5,962,272, herebyincorporated herein by reference. Any of the aforementioned approachesallows for the selection of the 5′-ends, followed by the ligation of alinker to the 5′-end of transcripts, where such a linker would contain aClass IIs or Class III recognition site for cutting of a sequencing tag,a cloning site, and/or a label for the purification of such a tag. Thusin this embodiment of the invention, the cap-structure would be used todirect the linker, and to assure the capturing of full-lengthtranscripts. Thus the invention provides means for capturing true5′-ends of transcripted regions.

In one embodiment the invention relates to the manipulation of nucleicacid molecules, where such nucleic acid molecules would be prepared inthe form of linear double-stranded DNA. Such double-stranded DNA can bederived from RNA, and be prepared according to any of the aforementionedapproaches, or can be taken from any other source, which allows for theisolation of double-stranded or single-stranded DNA from resourcesincluding, but not limited to, genomic DNA, cDNA, cloned DNA or anyfragment or mixtures thereof. Thus the invention is not limited to acertain source of nucleic acid, but any nucleic acid molecule as such orany mixture of thereof can be applied to perform the invention.Furthermore, as the invention can be applied to the use ofsingle-stranded RNA and DNA, it is within the scope of the invention tomanipulate the complexity of single-stranded nucleic acid molecules bythe means of subtraction, normalization or selective enrichment by anyof the methods known to a person trained to the state of the artincluding, but not limited to, the approaches published by Carninci P.et al., Genome Res. 10, 1617-1630(2000), hereby incorporated herein byreference (compare FIG. 1). Independent from the starting material usedto perform the invention, the single stranded first-strand cDNA materialcan be fractionated by means of subtractive hybridizations and physicalseparation to allow for enrichment of nucleic acid molecules ofdifferentially expressed genes or for the concentration of transcriptsof low abundance. Thus the invention relates to means on how to processpluralities of nucleic acid molecules for the purpose of their analysisand cloning.

In just a different embodiment, the invention relates to themanipulation of double-stranded DNA by the addition of specific linkersto both ends of such a double-stranded DNA molecule, where such linkerswould provide means for the further amplification, manipulation and/orpurification of the double-stranded DNA molecule. Such a linker orlinkers can be directly attached to double-stranded DNA in a ligationreaction, be introduced by the ligation of a double-stranded linkerhaving a single-stranded overhang to single-stranded DNA, or beintroduced as part of the primer used to drive the DNA synthesis from aRNA or DNA template. The linkers as attached to the ends of adouble-stranded DNA molecule would be preferable of double-stranded DNA.Any such linker independently of the way of usage or the way it wasintroduced or attached to the nucleic acid molecule would containcertain features for the manipulation of the double-stranded DNAmolecule. Such features could include, but are not be limited,recognition sites for restriction endonucleases, region complementary toprimers used in an amplification reaction, and labeling with selectivebinding substances including, but not limited to, biotin or digoxigenin.Furthermore, such linker can contain information for the labeling of theattached DNA molecules, where such a label would be encoded be a shortsequence within one or both linker molecules, and a recognition site foran endonuclease, which cleaves outside of its recognition sites. In apreferable embodiment, such a recognition site would be adjacent to thejunction point between the nucleic acid molecule and the linker. In adifferent embodiment, such a recognition site would be close or veryclose to the junction point between the nucleic acid molecule and thelinker, where the recognition site and the nucleic acid molecule wouldbe separated by one (1), two (2), three (3), four (4), five (5) or evensix (6) nucleotides. In a preferable embodiment, the endonuclease, whichcleaves outside of its recognition sites, is a Class IIS or a Class IIIenzyme. In an even more preferable embodiment, the endonuclease, whichcleaves outside of its recognition sites, is one out of Gsu I, MmeI,BpmI, BsgI, or EcoP15I. Thus the invention provides means for thelabeling of nucleic acid molecules, in particular where nucleic acidmolecules of different origin are mixed for the purpose of theiranalysis or cloning, where such labels are introduced by a linker or arederived thereof.

In just one more embodiment, the linkers as attached to the ends of adouble-stranded DNA molecule would provide the necessary means to allowfor the circularization of the DNA molecule. Here the invention relatesto the isolation of tags from ends of nucleic acid molecules, where suchregions can be derived from different experimental approaches and allowfor the characterization of the origin of the initial nucleic acidmolecules. Due to the circularization steps, the tags as derived fromthe ends of the same linear DNA molecule are linked to each other by aspacer as derived from linker sequences. Thus the invention providesmeans for the preparation of a new type of sequence tag, the so-calledGSC-tag (Gene-Scanning-CAGE-tag), which would allow for theidentification and characterization of nucleic acid molecules by theirend sequences. Furthermore, GSC-tags are prepared in such a way thatrelated tags from the same nucleic acid molecule are combined in thesame GSC-tag, and that the spacer sequence connecting the two tags fromthe ends would allow for the labeling of the GSC-tag by a short sequencetag. Therefore the circularization step is an essential part of theinvention, as only by connecting the ends of the nucleic acid molecule,it can be assured that both ends from the same molecule would be clonedinto the same GSC-tag. Alternatively, it can be envisioned that thecircularization of a nucleic acid molecule can be achieved by cloninginto a vector, where the resulting vector construct would be comprisedof circular DNA. Where such a vector would provide the necessary meansfor the isolation of tags derived from the ends of the insert, it couldbe foreseen that after cutting out the central part of the insert, thetags could be directly ligated to each other using the backbone of thevector as a spacer to link tags as derived from the same nucleic acidmolecule, said insert. After the ligation of the two tags byself-ligation of the ends of the vector, such GSC-tags as comprised ofthe tags from both ends of the insert, said nucleic acid molecule, couldbe cut out of the vector and further processed according to theinvention. Thus it is within the scope of the invention to use a vectoror an unrelated nucleic acid molecule to perform the circularizationstep, where such a vector or nucleic acid molecule would function as aspacer. The use of a vector or an unrelated nucleic acid molecule can beadvisable, where the linear DNA molecule, said nucleic acid molecule,may not allow for direct circularization, for example due torestrictions by its length. However, for many or most applications itcan be preferable to directly circularize the linear DNA molecule, saidnucleic acid molecule, using cloning sites as provided by the linkers,since the direct circularization would reduce the number of steps toperform the invention.

The circulation reaction can make use of blunt ends or cohesive endsdepending on the experimental needs. In a preferable embodiment of theinvention the linkers at both ends of the nucleic acid molecule haverecognition sites for the same restriction endonuclease or isoschizomerscreating the same cohesive ends or blunt ends to allow for therecombination of these ends (compare FIG. 2). In such an experiment,parts of the linker sequences would be cleave of to create the cohesiveends for self-ligation. In a different embodiment, the ends of thelinkers, as released after the digestion with the restrictionendonulcease, would have selective binding substances attached to them,which would allow for their separation from the nucleic acid moleculesby the means of a high affinity binding substance. Such pairs ofselective binding substances and high affinity binding substancesinclude but are not limited to the combination of biotin-labeling ofnucleic acid molecules and binding to avidin or streptavidin, or the useof digoxigenin and an antibody directed against digoxigenin. Bothsystems provide convenient means for the separation of free nucleic acidmolecules and labeled linker fragments, where such fragments can beeasily removed by attaching the high affinity binding substance to aninsoluble matrix. Many protocols are known to a person trained to thestate of the art for the use of an insoluble matrix for the separationof labeled nucleic acid molecules from non-labeled nucleic acidmolecules. In a different embodiment of the invention, the nucleic acidmolecule has been prepared in such a way that it is resistant tocleavage by the restriction endonulclease used for digesting thelinkers. Such a protection can be achieved for example by theincorporation of modified nucleotides during the chemical or enzymaticsynthesis of such nucleic molecules, or by the later modification ofsuch nucleic acid molecules by the means of a methyltransferase. Manymatching pairs of restriction endonucleases and methyltransfereases areknown to a person trained to the state of the art in the field, whichcould be applied here, including, but not limited to, those commerciallyavailable from New England BioLabs(http://www.neb.comn/nebecomin/default.asp, the product documentation asprovided at their homepage is hereby incorporated herein by reference)or Fermentas (http://www.fermentas.com/, the product documentation asprovided at their homepage is hereby incorporated herein by reference).Furthermore, it is within the scope of the invention to perform thecircularization of the nucleic acid molecules by the means of arecombinase, or overlap extension reactions. In a different embodiment,the circularization step could be preformed by the means of arecombinase, where the linkers would provide the necessary means toallow for the recombination step. A person trained to the state of theart is familiar with many recombination systems, which could be appliedhere. In particular the Cre (Causes REcombination) recombinase from thebacteriophage PI, which catalyzes the recombination between twoidentical double stranded loxP sites (Locus Of crossover (X) in P1sites), is widely used as a valuable tool, where it is a great advantagethat the Cre/loxP system finctions without any co-factors or additionalsequence elements allowing for effective recombination in vito. The Crerecombinase mediated step can be performed on purified DNA where suchDNA will be incubated directly with the enzyme. Purified Cre recombinasecan be obtained from different suppliers including CLONTECH (BDBiosciences, Palo Alto, Calif., USA), Novagen (Madison, Wis., USA), andNew England BioLabs (Beverly, Mass., USA), the maker's instructions anddocumentations on all of them are hereby incorporated herein byreference. Thus the invention provides means where by the use ofdifferent restriction endonucleases or recombinases a linear DNAmolecule is converted into circular DNA molecule. The circularizationstep brings the ends of the linear DNA molecule, said nucleic acidmolecule, together to allow for the preparation of GSC-tags holdingsequence information on both ends of the linear DNA molecule, saidnucleic acid molecule, and having a linker-derived spacer region, wheresuch a spacer could contain elements to label its origin by a sequencetag. The circularization step allows further for the labeling of nucleicacid molecules, and where the recognition sequence of the restrictionendonuclease would function as a sequencing tag after the formation ofthe circular nucleic acid molecule. Thus the invention provides meansfor the conversion of linear DNA into circular DNA for the purposemanipulation of the ends of a linear DNA molecule.

In another embodiment of the invention, remaining linear DNA is removedfrom circular DNA after the circularization reaction by the means of anexonuclease. Such an exonuclease should have a much higher activity forlinear DNA as compared to circular DNA- One example for such anexonuclease could be exonulcease m (available from Fermentas, #EN0191,http://www.fermentas.com/, the product documentation to it is herebyincorporated herein by reference) or exonulcease I (available fromFermentas, #EN0581, http://www.fermentas.com/, the product documentationto it is hereby incorporated herein by reference), but there are manymore exonucleases known to a person familiar with the field, which couldbe applied for this step. Thus the invention provides means for theremoval of nucleic acid molecules, which failed in the self-ligationreaction, and to enrich for circular nucleic acid molecules over linearnucleic acid molecules.

In a different embodiment of the invention the circular DNA is used inan amplification reaction. Many approaches are known to a person trainedto the state of the art in the field for the amplification of circularDNA including, but not limited to, the use of the so-called “rollingcircle” amplification. As shown in FIG. 3, the amplification of thecircular DNA for the purpose of the invention is preferentially done bythe means of a rolling circle amplification reaction making use ofrandom primers including, but not limited to, the use of hexamers, and aDNA polymerase with a strong strand-replacement activity including, butnot limited to, Phi29 DNA polymerase. Such an amplification reaction forexample can be performed by the TempliPhi™ DNA Amplification Kit fromAmersham Biosciences (Cat. No. 25-6400-10, the handbook of which ishereby incorporated herein by reference). This kit and any similarisothermal amplification reaction provides very effective means for theamplification of circular DNA over linear DNA, as linear DNA cannotfunction as a template for rolling circle amplification reactions. Thusthe invention provides means for the selective amplification of circularDNA over linear DNA to make circular DNA available for furthermanipulation.

Further, the invention relates to steps to manipulate DNA fragments insuch a way that the linkers attached to the ends of a nucleic acidmolecule, and as used in the circularization step, would contain one ormore recognition sites for a Class IIs or Class III enzyme adjacent orclose to their cloning sites, said the nucleic acid molecule. In apreferable embodiment, the Class IIs enzyme would be GsuI, in a morepreferable embodiment, the Class IIs enzyme would be MmeI, and in aneven more preferable embodiment, the Class III restriction enzyme wouldbe EcoP15I. Thus the length of the tags as cut off from the ends of theDNA molecule may vary dependent on the restriction enzyme used to createthem. Furthermore, it is within the scope of the invention, thatdifferent enzymes are used for the digestion at the 3′- and the 5′-end,and that the 3′-end and 5′-end related tags have a different length.Therefore tags as derived from the ends of a DNA molecule, said nucleicacid molecule, may have a length of ten to fifteen (10-15), fifteen totwenty (15-20), twenty to twenty-five (20-25), or twenty-five to thirty(25-30) bp. Just as an example, in the case of using the preferableenzyme MmeI, the tags would be some 16/18 bp in length. Thus the linkerswould provide the necessary means to cleave out fragments, said tags,from the ends of such DNA molecules. Thus the invention relates to theisolation of tags from ends of nucleic acid molecules, where such tagscould be used for the identification and characterization of the nucleicacid molecule, from which the tags are derived. In a preferableembodiment of the invention such tags are isolated from the nucleic acidmolecules after the self-ligation step. In this embodiment, thefragments as released by digestion with the Class IIs or Class IIIenzyme would be comprised of tags derived from both ends of the nucleicacid molecule linked to each other by sequences derived from thelinkers. Thus the invention provides means for the isolation ofsequencing tags from both ends of a nucleic acid molecule, where the twotags as derived from the same nucleic acid molecule would be attached toeach other via a spacer as derived from the linkers. As the connectinglinker sequences comprise the recognition site used in thecircularization step, the linker would further contain a sequencing tagsfor labeling the origin of the tags in pluralities of nucleic acid asobtained from different samples.

In a different embodiment, the invention relates to the cloning of thetags as derived from both ends of DNA molecules, said GSC-tags, wheresuch tags are purified and cloned into concatemers, and where suchconcatemers are cloned into libraries for easier manipulation andsequencing (FIG. 4). In a preferable embodiment, the digestion step withthe Class IIs or Class III enzyme creates cohesive ends for the ligationof different tags to each other. For example for the use of MmeI, theenzyme would create N2 overhangs, where N2 would allow for 16 differentcombinations. Therefore for the use of complex samples as comprised ofpluralities of nucleic acid molecules, 16 different combinations wouldallow for the cloning of tags into concatemers. Reaction conditions forconcatenation reactions on mixtures of tags prepared by the use of MmeIare known to a person trained to the state of the art in the fieldincluding, but not limited to, protocols used for the preparation ofDi-Tags within of Long-SAGE libraries (WO 02/10438 A2, herebyincorporated herein by reference). In a different embodiment, the endscreated by the digestion with the Class IIs or Class III enzyme areconverted into blunt ends, and the concatenation reaction makes use ofthe ligation of blunt ends. Many different approaches are known to aperson trained to the state of the art for the blunting of DNAincluding, but not limited to, those described by Sambrook J. andRussuell D. W., ibid, hereby incorporated herein by reference. Thus theinvention provides means for the assembly of tags into concatemers forthe purpose of high-throughput sequencing of tags as derived from theends of nucleic acid molecules, said GSC-tags.

In another embodiment of the invention, the concatemers are cloned intoa vector to prepare a library (FIG. 5). For the cloning into the vector,matching recombination sites can be used as used in the concatenationreaction, or the concatemers could be blunted at their ends to allow forcloning into a vector. Many different approaches are known to a persontrained to the state of the art for the blunting of DNA and the ligationof blunt ends.including, but not limited to, those described by SambrookJ. and Russuell D. W., ibid, hereby incorporated herein by reference. Ina preferable embodiment of the invention the concatemers would be clonedinto the vector pGSC (FIG. 6), which provides different cloning sitesfor the use of cohesive or blunt ends. In a different embodiment of theinvention linkers are attached to the ends of the concatemers, wheresuch linkers would provide priming sites for the amplification of theconcatemers and/or cloning sites for the cloning of the concatemers intoa vector. It is within the scope of the invention, to use such linkersto introduce recombination sites for the cloning of the concatemers bythe means of a recombinase rather than using classical means such asrestriction endonucleases including, but not limited to, rare cuttersand a ligase. In one example, the cloning of the concatemers could beperformed by the Gateways System from Invitrogen(http:www.invitrogen.com/, the information to which as provided on theirhomepage is hereby incorporated herein by reference). In a morepreferable example, the Gateway® BP Clonase™ Enzyme Mix from Invitrogen(Cat. No. 11789-013, the product information on which is herebyincorporated herein by reference) is used to clone the PCR productscomprising the concatemer into a target vector. In just a differentembodiment the invention relates to the cloning of tags from differentsamples into a library, where a label would mark the origin of eachmolecule within such a mixed tag library. Similarly, tags as prepared bydifferent approaches can be individually labeled and used for thepreparation of pooled libraries, where—as explained above - sequencesderived from the linkers would function as a label of each tag.Furthermore, in cases where linkers have been used for the cloningand/or amplification of the concatemers, such terminal linkers couldintroduce sequence tags to mark concatemers and their origin. Thus theinvention relates to the preparation of libraries with the option to thelabeling of tags by defined sequences, where such sequences would beintroduced during the linker ligation steps before cloning intolibraries.

In a different embodiment, the invention provides means for the analysisof concatemers by sequencing in combination with computational analysis.Regions as derived from linkers would in such an application provideinformation on the origin and the orientation of the sequencing tagswithin the concatemer, as compared to the regions derived from the endsof the nucleic acid molecule. As the structure of the GSC-tag is known,computational means would allow for the identification of the differentregions within the GSC-tag, such as those derived from the nucleic acidmolecule and those derived from the linker. The sequencing tags as suchwould be further analyzed and annotated by the computational methodsincluding, but not limited to, the mapping to genomic sequences,alignments to sequence information within the public domain includingthose on transcribed regions, alignments against each other, orstatistical analysis on GSC-tag frequencies within libraries, including,but not limited to, the applications disclosed in PCT/JP03/15956,PCT/JP03/07514 and WO 02/10438, all hereby incorporated herein byreference. Thus the invention provides different means for the analysisof nucleic acid molecules for example for their expression in abiological sample, or for example for their contribution to a given cDNAlibrary.

In just another embodiment, the invention relates to the sequencing ofthe tags to allow for their annotation by computational means and theirstatistical analysis, where such tags would be derived from regionswithin genomes. It is within the scope of the invention to preparefragments from genomic DNA, and to characterize such fragments bysequencing tags derived from the ends of such fragments of genomic DNA.In one embodiment such genomic DNA fragments could be obtained fromregions bound to DNA binding proteins. One approach for theidentification of targets for distinct DNA binding molecules is theso-called “Chromatin Immunoprecipitation” (ChIP), where in vivo DNAbinding molecules are cross-linked to their binding sites within genomicDNA by treatment with formaldehyde (Kuras L., Methods Mol. Biol. 284,147-162 (2004), hereby incorporated herein by reference). Afterimmunoprecipitation of the protein-DNA complexes with specificantibodies targeted against such a DNA binding molecules, DNA fragmentscan be amplified from such complexes by any method known to a persontrained to the state of the art in the field, and forwarded to cloningof tags from both ends of such genomic fragments by the means of theinvention. Similar information can further be obtained by the dammethyltransferase assay, which applies fusion proteins of the dammethyltransferase and DNA binding factors. The DNA-binding domain of theDNA binding factor as part of the fusion protein will tether the dammethyltransferase to specific binding sites in the genome, which resultsin adenine niethylation at the binding site. Isolated genomic DNA canthen cleavsed by the methylation-dependent restriction endonucleaseDpnI, and DNA fragments are isolated for analysis (van Steensel B. andHenikoff S., Nat. Biotechnol. 18, 424428 (2000), and van Steensel B. etal., Nat. Genet. 27, 304-308 (2001), both hereby incorporated herein byreference). Similar to genetic fragments obtained by ChIP, thosefragments can be applied to perform the invention. Thus the inventionrelates to the characterization of genetic elements within genomes,where such elements could be analyzed by computational means such asmapping to a genome or alike.

In just a different embodiment, the invention relates to the preparationof hybridization probes from the ends nucleic acid molecules, where suchregions would be analyzed by the means of in situ hybridization (FIG.8). Thus the invention provides means for the confirmation of theboarders of nucleic acid molecules by independent means, where thehybridization probes could be prepared by ligation of linkers to theends of a nucleic acid molecule, and where such linkers would be usedfor the preparation of hybridization probes. In a different embodimentof the invention sequences as derived from the tags would be used forprimer design, where such primers could be used to drive the preparationof the hybridization probes.

In a different embodiment of the invention, hybridization probes asderived from sequencing tags are used in in situ hybridizationexperiments, said oligonucleotides. Such experiments include, but arenot limited to, the use microarrays FIG. 9). In a preferable embodiment,the microarray is a tiled array, where short oligonucleotides coverpartial or entire genomic DNAs, as for example described by Kapranov P.et al., ibid, hereby incorporated herein by reference. Thus theinvention provides means for the annotation of sequencing tags byhybridization to microarray, where such a microarray comprises genomicregions. However, the use of hybridization probes derived fromsequencing tags is not limited to the use in microarray experiments, asa person trained to the state of the art in the field will know manymore applications for the use of hybridization probes including, but notlimited to, the ones described by Sambrook J. and Russel D. W. ibid,hereby incorporated herein by reference, or the use of tissue arrays(Sauter G et al., Nature Reviews Drug Discovery 2, 962-972 (2003),hereby incorporated herein by reference).

In just another embodiment, the invention provides means for thepreparation of 3′- and 5′-end specific hybridization probes directlyfrom a plurality of RNA molecules. In this embodiment double-strandedlinkers having single-stranded overhangs attached to one of the twostrands are ligated to the end sequences of the RNA molecules, where oneof the strands within the linker will prime the synthesis of the secondstrand, and where adding terminators into the reaction mixture cancontrol the length of the newly synthesized strand. In the case ofpreparing probes related to 3′-ends, the probe can be synthesizeddirectly from the RNA template, whereas for the preparation of probesrelated to the 5′-end, the probes would be prepared from thefirst-strand cDNA as a template. Many different protocols are known to aperson trained to the state of the art to perform the linker ligationstep and the following primer extension reaction, including, but notlimited to, Shibata Y. et al., Biotechniques 30, 1250-1254 (2001),hereby incorporated herein by reference. In particular, the use ofdouble-stranded linkers having random overhangs or overhangs of definedsequence is of great value to direct the linker to the ends of RNA/DNAmolecules. Thus, the invention provides a means for avoiding internalpriming. Furthermore, such linkers can be used for the priming ofnon-polyadenylated RNA, where a linker having an oligo-dT overhang canspecifically block the priming from polyadenylated RNA- Such a linkerwould further have features to block priming of the extension reactionfrom ployA mRNA, and would have a high affinity label attached to it forselective removal of the ligation product. The invention provides ameans for the preparation of end-specific hybridization probes from aplurality of RNAs, which can be used in combination with tiled arrays orin any other hybridization experiment known to a person familiar withthe state of the art.

In a different embodiment of the invention, sequence information derivedfrom the concatemers can be used to synthesis specific primers for thecloning of full-length cDNAs. In such an approach, the sequence derivedfrom a given 5′- and 3′-end specific tags allows the design of forwardand reverse primers to be used in the amplification reaction.Amplification by the polymerase chain reaction (PCR) can be performedusing a template derived from a plurality of RNA obtained from abiological sample and an oligo-dT primer. In the first step the oligo-dTprimer and a reverse transcriptase are used to synthesis a cDNA pool.Similarly, the first-strand cDNA synthesis could be primed by theaforementioned ligation of a double-stranded linker having asingle-stranded overhang to the 3′-end of RNA In the second step aforward and reverse primers derived from the tags are used to amplify afull-length cDNA from the cDNA pool. Similarly, a specific full-lengthcDNA can be amplified from an exciting cDNA library. Further, it iswithin the scope of the invention to use sequence information derivedfrom tags related to genetic elements to design primers for theamplification and cloning of regions within genomic DNA, said promotersor fragments thereof This includes the option to prepare one primer froma GSC-tag and the second tag from a start site of transcription toamplify or clone larger fragments of promoter regions. Many approachesare knowvn to a person familiar with the art for the identification ofstart sites of transcription including, but not limited to, the CAGEmethod disclosed in PCT/JP03/07514, and Shiraki T. et al., Prog. Natl.Acad. Sci. USA 100, 15776-15781 (2003), both hereby incorporated hereinby reference.

In a different embodiment, the invention relates to a kit, where such akit would provide the necessary reagents, enzymes and protocols toperform the invention. Thus it can be envisioned that different kitscould be provided, where some of the reagents, enzymes or protocols aredistinct to adopt the reaction conditions to particular questions ornucleic acid molecules. Such kits could be of value as tools in thefiled of life sciences, or forensic assay targeting for the detectionand/or identification of certain nucleic acid molecules. Thus it iswithin the scope of the invention to prepare kits, which would bedesigned for the detection of specific nucleic acid molecules. In oneembodiment, such a selective enrichment would be achieved by themanipulation of single-stranded DNA by the means of subtraction and/ornormalization. In a different embodiment, such a selective enrichmentwould be achieved by the use of specific primers during an amplificationstep. In a more preferable embodiment, such a selective enrichment wouldbe achieved by the use of specific primers during the rolling-circleamplification step. Furthermore, a kit for the preparation ofhybridization probes according to the invention is within the scope ofthe invention. Similarly, such a kit could provide the necessary meansto apply the invention for the purpose of diagnostics.

In conclusion, the invention provides new approaches for the cloning andanalysis of sequencing tags by the means of high-throughput sequencing,which will be of great value for the analysis of nucleic acid molecules.The invention provides further the necessary tools to prepare specifichybridization probes as needed for performing in situ hybridizationexperiments, where related tag sequences would drive the probe design.Thus, the invention is of high importance especially for the annotationof in situ hybridization experiments using tiled arrays, and offers thenecessary means for preparing hybridization probes derived from definedregions within nucleic acid molecules.

EXAMPLES

The present invention will now be further explained in more detail withreference to the following examples. All names and abbreviations as usedto describe the invention herein shall have the meaning as known to aperson skilled in the art.

Example 1 Isolation of RNA

To perform the invention mRNA or total RNA samples can be prepared bystandard methods known to a person trained hi the art of molecularbiology as for example given in more detail in Sambrook J and Russel DW, ibid, hereby incorporated herein by reference. Furthermore, CarninciP et al. (Biotechniques 33 (2002) 306-309, hereby incorporated herein byreference) described a method to obtain cytoplasmic mRNA fractions.Although the use of cytoplasmic RNA can be preferable, however, theinvention is not limited to this method and any other approach for thepreparation of mRNA or total RNA should allow for the performance of theinvention in a similar manner.

The preparation of mRNA from total RNA or cytoplasmic RNA is preferablebut not essential to perform the invention as the use of total RNA canprovide satisfying results in combination with the Cap-selection stepperformed during full-length cDNA library preparation. Here, we havecommonly used the Cap-trapper approach, which effectively removesribosomal RNA from library preparations. Generally speaking, mRNArepresents about 1-3% of the total RNA preparations, and it can besubsequently prepared by using commercial kits based on oligodT-cellulose matrixes. Such commercial kits including, but not limitedto, the MACS mRNA isolation kit (ilteny) which provided satisfactorymRNA yields under the recommended conditions when applied for thepreparation of mRNA fractions for performing the invention. To performthe invention one cycle of oligo-dT mRNA selection is sufficient asextensive mRNA purification can cause a loss of long mRNAs.

All RNA samples used to perform the invention were analyzed for theirratios of the OD readings at 230, 260 and 280 nm to monitor the RNApurity. Removal of polysaccharides was considered successful when the230/260 ratio was lower than 0.5 and an effective removal of proteinswas obtained when the 260/280 ratio was higher than 1.8 or around 2.0.The RNA samples were further analyzed by electrophoresis in an agarosegel to prove a good ratio between the 28S and 18S rRNA in total RNApreparations (note rRNA size may change for preparation of total RNAfrom other species than mammalians), and to show the integrity of theRNA fractions.

Example 2 cDNA Library Preparation

For the purpose of this example, fill-length cDNA libraries wereconstructed as described by Carninci P. and Hayashizaki Y., ibid, herebyincorporated herein by reference. This approach makes use of theCap-trapper approach for full-length cDNA cloning. DNA fragments werecloned into the phage/vector system pFLC, as disclosed in patentapplication WO 02/070720 A1, hereby incorporated herein by reference.

Phage solutions as prepared to perform the invention were stored inmedium containing 7% DMSO and kept at −80° C. However, the invention isnot limited to the aforementioned procedure for library preparation, asa person trained to the state of the art knows other methods for thepreparation of full-length selected libraries.

Example 3 Removal of polyA-tails from cDNA

For the purpose of the invention, cDNAs are prepared from RNA or mRNAfractions as described in Example 2 with the following modifications,which are necessary to remove polyA-tails from cDNA preparationsprepared by the use of an oligo-dT primer. Stretches of oligo-dT derivedsequences are removed by the means of the Class IIs enzyme GsuI asdescribed by Shibata Y. et al., Biotechniques. 1042 to 1044, 1048-1049(2001), hereby incorporated herein by reference.

For the first strand synthesis, the following primer is used which has arecognition site for GsuI:

Primer GsuI-T14: 5′-AGAGAGAGAGTCGGAGTTTTTTTTTTTTTTVN (SEQ ID NO: 1)

After the first strand cDNA synthesis, the materials are processed asdescribed in Example 2 for the selection of full-length cDNAs by theCap-Trapper method. In the linker ligation step, the followingoligonucleotides were used for linker preparation and to introduce MmeIand XmaJI sites:

5′-Adaptor GS Adaptor C N6-up: (SEQ ID NO: 2)5′-GAGAGAGAGACTCGAGACGGCATATCCTAGGTCCGACNNNNNN 5′-Adaptor GS Adaptor CGN5-up: (SEQ ID NO: 3) 5′-GAGAGAGAGACTCGAGACGGCATATCCTAGGTCCGACGNNNNN5′-Adaptor GS Adaptor C down: (SEQ ID NO: 4)5′-(p)GTCGGACCTAGGATATGCCGTCTCGAGTCTCTCTCTC

Note that the two upper strands are used in a ration of GN5 to GN6 of4:1.—After preparation of the second strand double-stranded cDNAs werepurified as described in Example 2 before being forwarded to GsuIdigestion under the following conditions:

cDNA X μl 10x buffer B (Fermentas) 5 μl 1 u/μl GsuI (Fermentas) Y μl (10u/μg cDNA) 0.1x TE Z μl Total volume 50 μl* *Depending on sample amount,change the reaction volume.

After 1 h incubation at 30° C., the following solutions were added tothe reaction:

0.5 M EDTA 4 μl 10% SDS 4 μl 20 μg/μl Proteinase K (Qiagen) 4 μl

Incubate at 45° C. for 15 min, and continue with Phenol/Chloroformextraction using the following volumes:

Phenol/Chloroform 200 μl

Centrifagation at room temperature with 15,000 rpm for 3 min, performback-extraction with 100 μl of 0.1× TE, repeat extraction steps withChloroform only, and recover the aqueous phase for further purificationby microfiltration on a Microcon YM100 (Millipore).

Add 0.1× TE buffer to the cDNA to a final volume of 400 μl, and followthe maker's direction, hereby incorporated herein by reference, for thefiltration step. The volume of the recovered sample should be in therange of about 15 μl.

As an option, the 2 bp overhangs created by GsuI can be converted intoblunt ends using the 3′ to 5′ exonuclease activity of T4 DNA polymerase.This step is not essential to perform the invention, as also adaptorswith a random overhang of 2 bp can be applied in the ligation step.Note, that the blunting step removes 2 bp from the original cDNA

cDNA X μl (>0.1 pmole) 0.1x TE Y μl Total volume 14.6 μl

Incubate at 65° C. for 5 min, and place on ice immediately. Under theassumption that 100 ng of 2.000 bp cDNA/GsuI are equal to 0.3 pmol end,add the following solutions for the blunting step:

10x T4 DNA Polymerase Buffer (Takara) 2 μl 2.5 mM dNTPs (Takara) 1.4 μlVortex 0.1% BSA 2 μl Vortex 1 u/μl T4 DNA polymerase 1 μl (1 u)

Mix by pipetting gently up and down, and incubate at 37° C. for 5 min;make sure that the sample is not incubated for a longer time.

Vortex vigorously on ice to inactivate T4 DNA polymerase, and add thefollowing solutions:

0.1x TE 30 μl 0.5M EDTA 1 μl 10% SDS 1 μl Proteinase K (Qiagen) 2 μlTotal volume 55 μl

Incubate at 45° C. for 15 min, and continue with a Phenol/Chloroformextraction using 50 μl of Phenol/Chloroform, and recover the aqueousphase for fuirther purification by microfiltration on a Microcon YM100(Millipore). The filtration step follows the maker's instructions,hereby incorporated herein by reference.

To the blunted 3′-end, a double-stranded adaptor has been ligated, wherethe 3′-adaptor was assembled from the following oligonucleotides:

3′-Adaptor GS 3′ Adaptor C up: 5′-(p)GTCGGACCTAGGAATTGCCGTG (SEQ ID NO:5) 3′-Adaptor GS 3′ Adaptor C Blunt-down: 5′-GATCCACGGCAATTCCTAGGTCCGAC(SEQ ID NO: 6)

Note that in a different embodiment of the invention, the cDNA fragmentscan be amplified by PCR or alike to have larger amounts of DNA forfurther manipulation. In such a case, primers would be used as selectedfrom the 5′- and 3′-adaptors, and PCR reactions should be performed witha high fidelity DNA polymerase. Although the amplification of the DNAmaterials is possible after the ligation of the second adaptor, wecommonly refrain from amplifying the DNA at this stage as the PCRreaction is highly bias towards shorter DNA fragments, and leads to anuneven distribution of tags within the final library.

For the 3′-adaptor ligation step prepare the following reaction mixture(cDNA: adaptor ratio should be 1:<50):

cDNA X μl 0.4 μg/μl GS 3′ Adaptor C 0.5 μl (200 ng) 0.1x TE Y μl 10 ×Ligation Buffer (NEB) 2 μl 400 u/μl T4 DNA Ligase (NEB) 0.5 μl Totalvolume 20 μl

Incubate at 16° C. overnight, and inactivate the ligase at 65° C. for 15min. Optionally, the ligation product can be further purified byProteinase K treatment, followed by Phenol/Chloroform extraction andultrafiltration to remove remaining free adaptor. However, thosepurification steps are not essential to perform the invention, as theligation product is commonly clean enough for digestion with a standardrestriction enzyme, as for the purpose of this example the enzyme XmaJI.Furthermore, free adaptor can be removed after the digestion step.

cDNA 20 μl 10xbuffer Y+ (Fermentas) 10 μl 10xBSA 10 μl XmaJI (Fermentas)X μl (50 u/μg) 0.1x TE Y μl Total volume 100 μl

Incubate at 37° C. for 1 h, and inactivate the enzyme by heating to 65°C. for 15. min. Further purify the cDNA fragments by Proteinase Ktreatment, Phenol/Chloroform extraction, followed by PEG precipitation.The PEG precipitation is applied here to remove the very short fragmentscut off from the adaptors and free adaptors. For the purpose of thisexample, short fragments were removed by PEG precipitation, as theadaptors used here were not labeled by a selective binding substancee.g. biotin or digoxigenin. Example 10 describes the use of labeledlinkers in fragment purification. For the precipitation by PEG preparethe following:

cDNA 150 μl 0.1x TE 50 μl 20% PEG8000 250 μl 0.1M MgCl₂ 50 μl Totalvolume 500 μl

Leave at room temperature for 10 min before centrifugation with 15,000rpm at room temperature for 10 min, remove the supernatant completely,and rinse the tube wall well with 20 μl of TE to make sure that theentire pellet is re-suspended. Leave the tube for a while at roomtemperature before transfering the solution into a new siliconized tube.Wash the original tube again with 20 μl of TE to make sure that thesample is recovered completely. Combine the cDNA solutions in one tube(about 40 μl in total). Optionally, remaining 3′ adaptors can be furtherremoved by gel filtration on a CL4B column (Amersham Biosciences).

Example 4 Preparation of GSC-Tags

For the preparation of GSC-Tags aforementioned cDNA fragments arecirculated by self-ligation using the cohesive ends created by digestionwith XmaJI. It is important to perform this ligation step in a largevolume (1 ng DNA/μl) to favor self-ligation over inter-molecularligation. For the reaction setup the following solutions (split the cDNAover various tubes where necessary to achieve a high dilution):

cDNA X μl (1 μg) 10x Ligation Buffer (NEB) 100 μl 400 u/μl T4 DNA Ligase(NEB) 50 μl (20,000 units, 20 u/ng) H₂O Y μl Total volume 1000 μl

Incubate at 23° C. for 2 h in a water bath, before inactivating theligase at 65° C. for 10 min.

The ligation product was firther purified using a “QLAquick PCRPurification Kit” (Qiagen) according to the maker's directions, herebyincorporated herein by reference.

Remaining unligated DNA, and thus linear DNA, in the ligation mixturewas removed by Exonuclease III treatment. Exonuclease III acts only ondouble-stranded linear DNA and does not cut the circular DNA under thecontrolled condition. For Exonuclease III digestion set up the followingreaction:

Self-ligation products X μl (1.5 μg) 10x Exonuclease III buffer(Epicentre) 30 μl 200 u/μl Exonuclease III (Epicentre) 3 μl (400 u/μg)H₂O Y μl Total volume 300 μl (5 ng/μl)

Incubate at 37° C. for 30 min and add:

0.5M EDTA 6 μl

Inactivate Exonuclease II at 65° C. for 15 min, cool on ice, and purifyDNA by Proteinase K digestion, Phenol/Chloroform extraction, and ethanolprecipitation as described above. Dissolve the remaining pellet in 15 μlof 0.1× TE.

At this stage usually only very small amounts of DNA are available forthe further processing, and an amplification step in essential in mostcases to have sufficient DNA amounts for tag cloning. This is inparticular true, where the cDNA was not amplified by PCR after thesecond linker ligation step (see above). As it is desirable here toamplify only circular DNA, this amplification step makes use of theso-called rolling-circle amplification including but not limited theTempliPhi Amplification Kit from Amersham Biosciences (Product No.25-6400-10, the instructions of which are hereby incorporated herein byreference). This kit makes use of the Phi29 DNA polymerase and randompriming by hexamers to perform the amplification reaction. Commonly aslittle as 1 ng of circular DNA is sufficient for amplification, wherethe reactions can yield up to 1 μg of DNA after 4 to 12 h. As thereaction is sensitive to the use of too much template in the reaction,it can be preferable to run multiple reactions in parallel. Otherwise,amplification reactions are performed according to the maker'sdirections. Note that the reaction product can be very viscous as itcontains very long stretches of DNA.

Amplification products are directly forwarded to digestion with theClass IIs enzyme, for the purpose of this example MmeI. Where needed,viscous DNA solutions can be diluted to allow for a better pipetting.For the digestion with MmeI set up the following reaction:

Amplified DNA X μl (20 μg) 3.2 mM SAM* 20 μl (64 μM) 10xNEB buffer 4(NEB) 100 μl 2 u/μl MmeI (NEB) 15 μl (1.5 u/μg, 30 u) H₂O (Invitrogen) Yμl Total volume 1000 μl *S-Adenosylmethionine (NEB)

Incubate at 37° C. for 1 h, and purify reaction fixture by Proteinase Kdigestion, Phenol/Chloroform extraction, and precipitation under thefollowing conditions:

Add to about 600 μl DNA solution:

1 μg/μl Glycogen 3 μl 5 M NaCl 30 μl Isopropanol 600 μl

Incubate at −20° C. for more than 30 min, and centrifugate at 15,000 rpmat 4° C. for 15 min before washing the pellet twice with 80% ethanol,and dissolve the precipitant in 50 μl H₂O. As MmeI digestion can beinsufficient, analyze the reaction product by gel electrophoresis beforecontinuing the process.

The short GSC-tags as cut out with MmeI have to be separated from theremaining cDNA fragments. In theory, a GSC-tag has some 58 bp (2 times20 bp cut off from cDNA ends plus 18 bp from the three recognition sitesderived from the linkers), where the length of the tag may vary within arange of some 4 to 8 bp as MmeI digestion in not always precise.However, with some 58 bp in length the GSC-tags are much shorter thancDNA fragments but still longer than the adaptors used in the earlierpreparation steps. Thus the GSC-tags can by purified by size-selection.

GSC-tags were separated from other cDNAs by agarose gel electrophoresis.For the electrophoresis proceed as following:

Sample Preparation:

Sample DNA 20 μl (~800 ng) 10% SDS 1.5 μl (final ~0.5%) 0.1x TE 3.5 μl6x Dye (TAE) 5 μl Total volume 30 μl

Gel: 5% SeaPlaque/1×TAE/ EtBr+, Mupid Mini Gel

Buffer: 1× TAE bufferEtBr+

Run: Mupid System, 50 V, 150 min

After electrophoresis, cut out GSC-tags as compared to an appropriatesize marker using a UV transilluminator at 365 nm. When cutting out thegel slices, make sure to keep their size as small as possible.Furthermore, it is important to cut precisely the band around 58 bp,where it is preferable to cut sharp around the band rather thanretrieving as much DNA as possible.

Transfer gel pieces into a tube, add 300 μl TE buffer, and keep the tubeon ice for 1 h or overnight to elute the GSC-tags. GSC-tags were furtherretrieved from the gel pieces by filtration on a Micro Spin Column(Amersham) according to the maker's directions, hereby incorporatedherein by reference. The GSC-tags should be eluted in a volume of about700 μl.

After the gel purification step, GSC-tags are further concentrated onMicrocon YM-10 membrane (Millipore) according to the maker's directions,hereby incorporated herein by reference. About 20 μl of eluted DNAshould be recovered after this step.

Example 5 Concatenation of GSC-Tags

Individual GSC-tags are ligated into concatemers using their N2 cohesiveends out of the MmeI digestion step. Although 16 different overhangs canoccur, the complexity of most samples is sufficient to allow for theconcatenation of the different GSC-tags. However, in some cases, it canbe advisable to blunt the GSC-tags before the concatenation step,although this leads to a shortening of the tags. An example for theblunting of MmeI sites is given below.

For the ligation reaction mix the following components in a 0.2 μl PCRtube:

GSC-tag fragments X μl (300-500 ng) 10x buffer (Takara) 1 μl T4 DNALigase (Takara) 1 μl 0.1x TE Y μl Total volume 10 μl

Incubate ligation reaction at 16° C. for 5 min. Note that the ligationreaction should not exceed 5 min. Add 0.5 μl of 10% SDS beforeinactivating the ligase at 65° C. for 3 min.

To assure for a satisfying number of GSC-tags within each concatemer, itis advisable—although not essential—to perform a size fractionation ofthe concatenation products, where we commonly isolate fragments of morethan 500 bp.

Size fractionation of concatemers is commonly performed by agarose gelelectrophoresis under the following conditions:

Gel: 0.8% SeaPlaque/1×TAE/EtBr+

Buffer: 1× TAE buffer/EtBr+

Run: “50V, 170 min, at 4° C.

Cut out fragments of about 500 to 700 bp, and elute the DNA as describedabove. The DNA can be further concentrated using a Micro Spin Column(micron YM-10, Amersham Biosciences).

For the purpose of this example, the concatenation products were bluntedfor ligation into the vector. Although vectors with N2 overhangs can beprepared, it is preferable to clone blunted concatemers to assurecloning of all possible combinations. For the blunting reaction, setupthe following:

Concatemers X μl H₂O (Invitrogen) Y μl 10x buffer (Takara) 18 μl 0.1%BSA (Takara) 18 μl 1.7 mM dNTPs (dilute Takara 2.5 mM) 18 μl Totalvolume 162 μl

Incubate at 65° C. for 5 min before placing on ice for 1 min, then add:

4 u/μl T4 DNA Polymerase (Takara)  18 μl (72 u, 4 u/μg DNA) Total volume180 μl (18 μg/180 μl = 100 ng/μl)

Incubate at 37° C. for 5 min in a water bath without water circulation.After the incubation inactivate T4 DNA polymerase by vigorous vortexingfor about 10 min. From there proceed by digestion with Proteinase K,extraction with Phenol/Chloroform, and Chloroform.

Example 6 Preparation of Vector pGSC for Ligation Step

For the purpose of this example the vector pGSC is used to perform theinvention, however the invention can be performed using many othervector as well. As for the use of blunt end ligation of GSC-tags, thevector is digested with the restriction enzyme Hpa I. For the digestionthe following reaction is setup:

pGSC plasmid DNA X μl (20 μg) 10x NEBuffer 4 (NEB) 50 μl HpaI (NEB) 30μl (5000 u/ml) H₂O Y μl Total volume 500 μl (40 ng/μl)

Incubate at 37° C. for 2 h, and check an aliquot by gel electrophoresisto assure complete digestion. In case that the digestion was complete,purify the linear DNA by Proteinase K digestion, Phenol/Chloroformextraction, Chloroform extraction and ethanol precipitation. The DNAshould finally be dissolved in 40 μl H₂O.

To avoid self-ligation of the vector a de-phosphorylation by calfintestine alkaline phosphatase can be advisable. To perform the reactionsetup the following:

pGSC/HpaI 40 μl (20 μg, 35.2 pmole) 10x Buffer (Takara) 10 μl CIP(Takara) X μl (140 u, 4 u/pmole) H₂O Y μl Total volume 100 μl

Incubate at 37° C. for 15 min before inactivating the enzyme at 50° C.for 15 min. Purify the DNA by Proteinase K digestion, Phenol/Chloroformextraction, and ethanol precipitation. Finally dissolve DNA pellet in 80μl H₂O.

Furthermore, it can be advisable to purify the DNA in an agarose gelunder the following conditions:

Sample Preparation:

pGSC/HpaI/CIP 80 μl 6x Dye (TAE) 20 μl Total volume 100 μl

Gel: 0.8% SeaPlaque/1× TAE/EtBr+, Mupid small gel using wide wells

Buffer: 1× TAE buffer/EtBr+

Run: 35V, 160 min

After the electrophoresis, cut out the band corresponding to 2,800 bp ascompared to an appropriate size marker using a transilluminator (365nm). The DNA can be eluted from the gel pieces by the following steps:

Melt gel slices at 65° C. for 5 min, and confirm that all gel piecesmelted completely. Add to some 800 μl solution β-agarasebuffer mix(NEB), and incubate at 42° C. for 5 h. Add 5M NaCl at 1/9 of thereaction volume, and extract with Phenol/Chloroform. Precipitate the DNAout of the aqueous phase waith isoporpanol, wash twice with 80% ethanol,and dissolve the pellet in 30 μl H₂O. About 5 μg of linerized vector maybe gained, which can be stored at −20° C.

Example 7 Ligation of GSC-Tag-Concatemers into Vector pGSC

Purified concatemers as prepared according to Example 5 are ligated intovector pGSC/HpaI/CIP prepared according to Example 6. For the ligationreaction setup the following precipitation to concentrate the DNA:

Concatenated GSC-tags X μl (~200 ng) pGSC/HpaI/CIP vector Y μl (260 ng)5M NaCl Z μl (final concentration 250 μM) Isopropanol A μl

Ligation ratio: pGSC vector:Concatenated GSC-tag=1:2 (mol)

Incubate at −20° C. for more than 30 min before collecting theprecipitate by centrifugation at 15.000 rpm for 15 min at 4° C. Discardthe supernatant and wash the pellet twice with 80% ethanol beforedissolving the pellet with 26 μl 0.1× TE buffer. For the ligationreaction setup:

Concatemers/pGSC vector 5 μl 2 × Ligation Mix (Nippon Gene) 5 μl Totalvolume 10 μl

Incubate at 16° C. for 30 min before inactivation of the ligase, andthen inactive the ligase at 65° C. for 10 min. Commonly, the ligationproduct is directly used for transformation of bacteria, although it canbe advantageous to purify the ligation product for longer storage or tode-salt the reaction mixture for electroporation.

For transformation we commonly use the following setup, although otherapproaches or bacteria can be used as well at this stage:

Sample: 5 ng/μl, 2 μl Bacterial: DH10B T1 phase resistance (Invitrogen),20 μl

Commonly we prefer to use electroporation for the transformation stepusing Cell-Porator (Invitrogen) according to the transformationprocedures described in the manufacturer's manual, hereby incorporatedherein by reference. After electroporation spread some 10 μl of thebacteria on LB medium containing chloramphenicol (12.5 μl/μl).Individual colonies can be obtained after overnight grow at 37° C.Remaining bacteria not plated onto the selective media can be stored asglycerol stocks at −80° C.

Example 8 Insert Size Check for GSC-Tag Libraries

It can be of value to check the average insert size of the GSC-taglibraries before initiating high-throughput sequencing. The insert sizeof GSC-libraries can be determined by the following reaction setup.

Plasmid X μl (200 ng) 10x NEB Buffer 2 (NEB) 2 μl 100x BSA (NEB) 0.2 μl20 u/μl XbaI 0.2 μl (4 u) H₂O Y μl Total volume 20 μl (10 ng/μl)

Incubate at 37° C. for 2 h, and take an aliquot agarose gelelectrophoresis:

Sample DNA 5 μl 0.1x TE 5 μl 6x Dye (for TBE) 2 μl Total volume 12 μl

Gel: 1% Agarose (EtBr+, 1× TBE), Mupid gel

Buffer: 1×TBE buffer

Electrophoresis system: Mupid

Run: 100 V, 30 min

Example 9 Purification of Oligonucleotides for Library Preparation

Oligonucleotides as used in these Examples have been obtained fromInvitrogen, and were before use purified by 10% polyacrylamide/7MUrea/1×TBE gel electrophoresis.

Example 10 Capture of PCR Products by Streptavidine Coated MagneticBeads

In cases where biotinylated linkers or PCR primers have been used,reaction products can be attached to magnetic beads via aStreptavidin/biotin interaction. Commonly, we use here TakaraMAGNOTEX-SA (Takara) according to the maker's directions, herebyincorporated herein by reference. For sample preparation mix thefollowing:

Purified PCR product 100 μl (~5 μg) 2x Binding Buffer (Takara) 100 μlTotal 200 μl

Magnetic beads should be prepared from the slurry, from which

MAGNOTEX-SA 150 μlare placed on a Magnetic stand for 2 min. remove supernatant, then add:

1x Binding Buffer 200 μlvortex gently, apply magnetic force, remove supernatant, and repeatwashing step with 2×Binding Buffer (Takara), replace 2×Binding Buffer by1×Binding Buffer.

Add some 200 μl of PCR product to the magnetic beads, and incubate for15 min at room temperature under ongoing agitation. Apply the magneticforce and remove the supernatant, and wash the magnetic beads threetimes with 250 μl of 1× Binding Buffer.

cDNA fragments are released from the beads by digestion with anappropriate restriction endonuclease. For the purpose of this example,the enzyme XmaJI was used under the same conditions as described inExample 3.

Example 11 Determination of End-Sequences

After the titer check, bacterial clones were collected by commerciallyavailable picking machines (Q-bot and Q-pix; Genetics) and transferredto 384-microwell plates. Transformed E. coli clones holding vector DNAwere divided from 384-microwell plates and grown in four 96-well plates.After overnight growth, plasmids were extracted either manually (Itoh M.et al., Nucleic Acids Res. 25 (1997) 1315-1316, hereby incorporatedherein by reference) or automatically (Itoh M. et al., Genome Res. 9(1999) 463-470, hereby incorporated herein by reference). Sequences weretypically run on a RISA sequencing unit (Shimadzu) or a PerkinElmer-Applied Biosystems ABI 3700 in accordance with standard sequencingmethodologies such as described by Shibata K. et al., Genome Res. 10(2000) 1757-1571, hereby incorporated herein by reference. Sequencingwas alternatively performed using primers nested in the flanking regionsof the cloning vector and a BigDye Terminator Cycle Sequencing ReadyReaction Kit v1.1 (Applied Biosystems, Cat. No. 4337449) and an ABI3700(Applied Biosystems) sequencer according to the manufacture's productdescriptions, hereby incorporated herein by reference.

Standard primers as used for vectors of the pFLC or pGSC familyincluded:

M13 Reverse primer: 5′-CAGGAAACAGCTATGAC (SEQ ID NO: 7) M13 (−20)Forward primer: 5′-GTAAAACGACGGCCAG (SEQ ID NO: 8)

Example 12 Characterization of Sequence Tags

Individual sequence tags can be analyzed for their identity by standardsoftware solutions to perform sequence alignments like NCBI BLAST(http://www.ncb.nlm.nih.gov/BLAST/), FASTA, available in the GeneticsComputer Group (GCG) package from Accelrys Inc.(http://wwv.accelrys.com/) or alike. Such software solutions allow foran alignment of specific sequence tags among one another to identifyunique or non-redundant tags, which can be further used in databasesearches.

Example 13 Mapping of Sequencing Tags to the Genome

Specific sequence tags obtained as describe in this Example can be usedto identify transcribed regions within genomes for which partial orentire sequences were obtained. Such a search can be performed usingstandard software solutions like NCBI BLAST(http://www.ncbi.nlm.nih.gov/BLAST/) to align specific sequence tags togenomic sequences. In the case of large genomes like those from human,rat or mouse it may be necessary to extend the initial sequenceinformation obtained from concatemers. The use of extended sequencesallows for a more precise identification of actively transcribed regionsin the genome.

Example 14 Statistical Analysis of Sequence Tags

Sequence tags obtained from the same plurality of mRNAs in a sample ornucleic acid fragments within the same cDNA library can be analyzed by astandard software solution like NCBI BLAST(http://www.ncbi.nlm.nih.gov.BLAST/) to identify non-redundant sequencetags. All such non-redundant sequence tags can then be individuallycounted and further analyzed for the contribution of each non-redundanttag to the total number of all tags obtained from the same sample. Thecontribution of an individual tag to the total number of all tags shouldallow for a quantification of the transcripts in a plurality of mRNAs inthe sample or a cDNA library. The results obtained in such a way onindividual samples can be further compared with similar data obtainedfrom other samples to compare their expression patterns.

Example 15 Identification of Transcriptional Start Sites

5′ end specific sequence tags, which could be mapped to genomicsequences, allow for the identification of regulatory sequences. In agene the DNA upstream of the 5′ end of transcripted regions usuallyencompasses most of the regulatory elements, which are used in thecontrol of gene expression. These regulatory sequences can be furtheranalyzed for their functionality by searches in databases, which holdinformation on binding sites for transcription factors. Publiclyavailable databases on transcription factor binding sites and forpromoter analysis include:

Transcription Regulatory Region Database (TRRD)

(http://www.mgs.bionet.nsc.ru/mgs/dbases/trrd4/)

TRANSFAC (http://transfac.gbf.de/TRANSFAC/)

TFSEARCH (http:www.cbrc.jp/research/db/TFSEARCH.html)

PromoterInspector provide by Genomatix Software(http://www.genomatix.de/)

1. A method for preparing DNA fragments comprising sequencescorresponding to two opposite end regions of a linear nucleic acidmolecule, comprising the steps of: (a) creating a linear DNA moleculefrom a nucleic acid molecule; (b) ligating linkers to two opposite endsof the linear DNA molecule, wherein such the linkers contain a cloningsite and a recognition site for a restriction endonuclease that cleavesat a site outside its recognition site and within the linear DNAmolecule; (c) circularizing the linear DNA molecule by closing thelinear DNA molecule at the cloning site with the linkers so as to form acircular DNA molecule; (d) digesting the circular DNA molecule with therestriction endonuclease so as to cut out a DNA fragment from thecircular DNA molecule, wherein the DNA fragment comprises opposite endregions of the linear DNA molecule; and (e) isolating the DNA fragment.2-47. (canceled)
 48. The method according to claim 1, wherein thenucleic acid molecule is selected from the group consisting of a DNA,cDNA, genomic DNA, RNA, mRNA having poly(A) tail, mRNA lacking poly(A)tail and any mixture thereof.
 49. The method according to claim 48,wherein the nucleic acid molecule of step (a) is mRNA having poly(A)tail, and wherein step (a) comprises converting the mRNA into acomplementary DNA by the means of a reverse transcriptase and a primer,wherein the primer contains a Class IIS or Class III recognition sitefor removing stretches of oligo-dT used in the priming of the reversetranscription reaction from the. RNA which is an RNA having poly(A)tail.50. A method for preparing DNA fragments comprising sequencescorresponding to two opposite end regions of an RNA, comprising thesteps of: (a) creating a linear DNA molecule from an RNA; (b) ligatinglinkers to two opposite ends of the linear DNA molecule, wherein thelinkers contain a cloning site and a recognition site for a restrictionendonuclease that cleaves at a site outside its recognition site andwithin the linear DNA molecule; (c) circularizing the linear DNAmolecule by closing the linear DNA molecule at the cloning site with thelinkers so as to form a circular DNA molecule; (d) digesting thecircular DNA molecule with the restriction endonuclease so as to cut outa DNA fragment from the circular DNA molecule, wherein the DNA fragmentcomprises opposite end regions of the linear DNA molecule; and (e)isolating the DNA fragment, wherein step a) above comprises: (i)preparing a double-stranded linker having a single-stranded overhangingregion, wherein the single-stranded overhanging region is complementaryto the 3′-end sequence of the RNA; (ii) hybridizing the single-strandedoverhanging region to the 3′-end sequence of the RNA so as to ligate thedouble-stranded linker to the 3′-end of the RNA, (iii) extending astrand complement to the RNA from the 3′ end of the overhang region ofthe linker with a reverse transcriptase and (iv) separating a linear DNAmolecule from the reverse transcription product.
 51. The methodaccording to claim 50, wherein the RNA is enriched by the Cap Trappermethod or Oligo capping method, and thereby a full length cDNA isprepared in step a).
 52. The method according to claim 49, wherein anycomplementary sequences derived from a poly(A) tail of the mRNA areremoved from the linear cDNA molecule.
 53. The method according to claim1, wherein the restriction endonuclease is selected from the groupconsisting of the Class US, Class IIG, Class III restriction enzymes,Gsu I, MmeI, Bpm I, Bsg I, EcoP15I, and any mixture thereof.
 54. Themethod according to claim 1, wherein the linkers are attached to aselective binding substance to allow for enrichment by such binding. 55.The method according to claim 54, wherein the selective bindingsubstance is selected from the group consisting of biotin anddigoxigenin, and a high affinity binding substance bound to theselective binding substance is selected from the group consisting ofavidin, streptavidin, a derivative of avidin or streptavidin, and ananti-digoxigenin antibody.
 56. The method according to claim 1, where atleast one of the linkers contains sequence elements used for labellingthe DNA fragment.
 57. The method according to claim 1, wherein thelinear DNA fragments are removed from the circular DNA molecule by themeans of an exonuclease.
 58. The method according to claim 57, whereinthe exonuclease is exonuclease III, exonuclease I, or any mixturethereof.
 59. The method according to claim 1, further comprising thestep of amplifying the circular DNA molecule.
 60. The method accordingto claim 59, wherein the step of amplifying the circular DNA molecule isa rolling circle reaction.
 61. A method for preparing a concatemer,comprising ligating the DNA fragments to each other, wherein the DNAfragments are prepared by the method of claim
 1. 62. Vector pGSC.
 63. Amethod for obtaining information on the end sequences of a linearnucleic acid molecule, comprising some or all steps of: preparing theDNA fragments by the method according to claim 1, preparing a concatemerby ligating the DNA fragments to each other, and sequencing theconcatemer so as to obtain information on the end sequences of thelinear nucleic acid molecule.
 64. The method according to claim 1,wherein the DNA fragment is derived from a mixed sample.
 65. The methodaccording to claim 64, wherein the origin of the DNA fragment in themixed sample can be tracked by a label which is a short specificsequence in a spacer which is derived from the linker sequences.
 66. Amethod for priming a reverse transcription reaction, comprising thesteps of: (a) preparing a double-stranded linker having asingle-stranded overhanging region, wherein the single-strandedoverhanging region is complementary to a 3′ -end sequence of an RNA; (b)hybridizing the single-stranded overhanging region to the 3′-endsequence of an RNA so as to ligate the double-stranded linker to the3′-end of the RNA; and (c) extending a strand complement to the RNA fromthe 3′ end of the overhang region of the linker with a reversetranscriptase.
 67. The method according to claim 66, wherein theoverhanging part of the linker is comprised of oligo-dT.
 68. The methodaccording to claim 66, wherein the overhang part of the linker hasrandom sequence.
 69. A method for separating an mRNA having poly(A) tailand an mRNA having no poly(A) tail, comprising the steps of: (a)preparing double-stranded linkers having a single-stranded overhangingregion, wherein the overhang region of the first linker has oligo-dT andwherein the 3′-end of the oligo-dT overhang region is blocked andwherein the overhang region of the second linker has a random sequenceand the 3′-end of the random sequence is not blocked; (b) hybridizingthe single-stranded overhanging regions to the 3′-end sequence of an RNAso as to ligate the double-stranded linker to the 3′-end of the RNA inone or more ligation reactions; (c) perform the reverse transcriptionreaction so that a strand is extended from the 3′ overhang region of thesecond linker; (d) selecting the RNA ligated to the firstdouble-stranded linker; and (e) separating a linear DNA molecule fromthe reverse transcription product derived from the second linker. 70.The method according to claim 66, wherein the linker is attached to aselective binding substance used for the fractionation of RNAs.
 71. Themethod according to claim 66, further comprising the step of attachingthe linker to a high affinity selective binding substance so as to allowfor enrichment.
 72. The method according to claim 71, where theselective binding substance is selected from the group consisting ofbiotin and digoxigenin, and a high affinity selective binding substancebound to the selective binding substance is selected from the groupconsisting of avidin, streptavidin, a derivative of avidin orstreptavidin, or an anti-digoxigenin antibody.